Improving Thai Word and Sentence Segmentation Using Linguistic Knowledge
نویسندگان
چکیده
منابع مشابه
Chinese Word Segmentation Using Minimal Linguistic Knowledge
This paper presents a primarily data-driven Chinese word segmentation system and its performances on the closed track using two corpora at the first international Chinese word segmentation bakeoff. The system consists of a new words recognizer, a base segmentation algorithm, and procedures for combining single characters, suffixes, and checking segmentation consistencies.
متن کاملImproving Word Alignment Quality Using Linguistic Knowledge
Word alignment of bilingual parallel corpora is usually generated using only statistical information. External linguistic information like e.g. a dictionary or linguistic structural annotation of the texts is used rarely, despite its usefulness. Additionally, it has to our knowledge never been examined systematically how linguistic information can be employed for word alignment improvement. In ...
متن کاملThoughts on Word and Sentence Segmentation in Thai
This paper discusses problems of word and sentence segmentation in Thai. Disagreements on word segmentation are caused mostly from compound words. To set a standard resource and tool of word segmentation, we suggest that only simple words and true compound words should be segmented in the process of word segmentation. Other compounds can be grouped later by the same means as multiword identific...
متن کاملCollocation and Thai Word Segmentation
This paper presents another approach of Thai word segmentation, which is composed of two processes : syllable segmentation and syllable merging. Syllable segmentation is done on the basis of trigram statistics. Syllable merging is done on the basis of collocation between syllables. We argue that many of word segmentation ambiguities can be resolved at the level of syllable segmentation. Since a...
متن کاملThai Word Segmentation Verification Tool
Since Thai has no explicit word boundary, word segmentation is the first thing to do before developing any Thai NLP applications. In order to create large Thai word-segmented corpora to train a word segmentation model, an efficient verification tool is needed to help linguists work more conveniently to check the accuracy and consistency of the corpora. This paper proposes Thai Word Segmentation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEICE Transactions on Information and Systems
سال: 2018
ISSN: 0916-8532,1745-1361
DOI: 10.1587/transinf.2018edp7016